AITopics | model design

Collaborating Authors

model design

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Race to Build the DeepSeek of Europe Is On

WIREDJan-19-2026, 07:00:00 GMT

As Europe's longstanding alliance with the US falters, its push to become a self-sufficient AI superpower has become more urgent. As the relationship between the US and its European allies shows signs of strain, AI labs across the continent are searching for inventive ways to close the gap with American rivals that have so far dominated the field. With rare exceptions, US-based firms outstrip European competitors across the AI production line--from processor design and manufacturing, to datacenter capacity, to model and application development. Likewise, the US has captured a massive proportion of the money pouring into AI, reflected in the performance last year of its homegrown stocks and the growth of its econonmy . The belief in some quarters is that the US-based leaders --Nvidia, Google, Meta, OpenAI, Anthropic, and the like--are already so entrenched as to make it impossible for European nations to break their dependency on American AI, mirroring the pattern in cloud services.

deepseek, europe, infrastructure, (17 more...)

WIRED

Country:

Asia > Russia (0.14)
Asia > China (0.07)
Europe > Belgium (0.05)
(7 more...)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.96)
Government > Military (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Token Is All You Price

Zhong, Weijie

arXiv.org Artificial IntelligenceDec-12-2025

We build a mechanism design framework where a platform designs GenAI models to screen users who obtain instrumental value from the generated conversation and privately differ in their preference for latency. We show that the revenue-optimal mechanism is simple: deploy a single aligned (user-optimal) model and use token cap as the only instrument to screen the user. The design decouples model training from pricing, is readily implemented with token metering, and mitigates misalignment pressures.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.09859

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

On What Depends the Robustness of Multi-source Models to Missing Data in Earth Observation?

Mena, Francisco, Arenas, Diego, Miranda, Miro, Dengel, Andreas

arXiv.org Artificial IntelligenceMar-25-2025

Francisco Mena 1, 2, Diego Arenas 2, Miro Miranda 1, 2, and Andreas Dengel 1, 2 1 University of Kaiserslautern-Landau (RPTU), Kaiserslautern, Germany 2 German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany Abstract --In recent years, the development of robust multi-source models has emerged in the Earth Observation (EO) field. These are models that leverage data from diverse sources to improve predictive accuracy when there is missing data. Despite these advancements, the factors influencing the varying effectiveness of such models remain poorly understood. In this study, we evaluate the predictive performance of six state-of-the-art multi-source models in predicting scenarios where either a single data source is missing or only a single source is available. Our analysis reveals that the efficacy of these models is intricately tied to the nature of the task, the complementarity among data sources, and the model design. Surprisingly, we observe instances where the removal of certain data sources leads to improved predictive performance, challenging the assumption that incorporating all available data is always beneficial.

artificial intelligence, data source, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.19719

Country: Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.65)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

Mori, Masaya, Omae, Yuto, Koyama, Yutaka, Hara, Kazuyuki, Toyotani, Jun, Okumura, Yasuo, Hao, Hiroyuki

arXiv.org Artificial IntelligenceMar-14-2025

As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.

generalization performance, model design, texture feature, (14 more...)

arXiv.org Artificial Intelligence

2503.11331

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Finland (0.04)
Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Review for NeurIPS paper: A Causal View on Robustness of Neural Networks

Neural Information Processing SystemsJan-21-2025, 03:41:16 GMT

Additional Feedback: Given fundamental limits of network robustness to adversarial attacks (see "Limitations of Adversarial Robustness: Strong No Free Lunch Theorem"), where does the proposed method differ, or relate to that general framework for robustness / adversaries? Does the causality framework provide a "way out" from the bounds and limits shown in that work? The lack of robustness to horizontal and vertical shift in the MNIST example seem as coupled to the architectural bias of the particular discriminator design, as to the task itself - for example an object detection framework such as RCNN or modern variants (ala Mask-RCNN) should have little issue with the shifted image task described in the paper. How can we separate the issue of network design (which is frequently driven by known invariances in the desired domain - such as moving from simple DNNs to more applicable CNNs) and the causal manipulation model (which also has design parameters and potential pitfalls, as discussed in 3.2 and 4.2). If using some kind of automated network design setting (such as meta-learning or evolutionary approaches) would both the CAMA model design, and the discriminator itself need to be designed in conjunction, or some kind of back-and-forth iteration?

neural network, neurips paper, robustness, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers

Fu, Zhuolin

arXiv.org Artificial IntelligenceJun-13-2024

In recent years, the field of machine learning, especially natural language processing (NLP), has witnessed a transformative evolution, primarily catalyzed by the advent of Transformer models and large language models. These models are known for their emergent ability to comprehend and generate human-like text. Specifically, Transformer models seem to undergo a transformative evolution with the growth of parameter count, achieving unprecedented performance across a spectrum of tasks, including text generation, machine translation, text summarization, question answering, and visual understanding. This finding leads to the trends in scaling up models up to millions and even Billions of parameters, exemplified by OpenAI's GPT[1, 2], Google's BERT[3], Meta's Llama[4], and Anthropic's Claude[5]. However, this scaling in model size simultaneously has rendered a significant barrier for ordinary individuals to train these models on consumer hardware setups.

algorithm, arxiv preprint arxiv, transformer model, (13 more...)

arXiv.org Artificial Intelligence

2406.09315

Country:

North America > United States > New York (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Digital Business Model Analysis Using a Large Language Model

Watanabe, Masahiro, Uchihira, Naoshi

arXiv.org Artificial IntelligenceJun-9-2024

Digital transformation (DX) has recently become a pressing issue for many companies as the latest digital technologies, such as artificial intelligence and the Internet of Things, can be easily utilized. However, devising new business models is not easy for compa-nies, though they can improve their operations through digital technologies. Thus, business model design support methods are needed by people who lack digital tech-nology expertise. In contrast, large language models (LLMs) represented by ChatGPT and natural language processing utilizing LLMs have been developed revolutionarily. A business model design support system that utilizes these technologies has great potential. However, research on this area is scant. Accordingly, this study proposes an LLM-based method for comparing and analyzing similar companies from different business do-mains as a first step toward business model design support utilizing LLMs. This method can support idea generation in digital business model design.

business model, digital business model, similarity, (12 more...)

arXiv.org Artificial Intelligence

2406.05741

Country:

Asia > Japan (0.05)
Asia > Vietnam (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine (0.40)
Information Technology (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Novel Approaches for ML-Assisted Particle Track Reconstruction and Hit Clustering

Odyurt, Uraz, Dobreva, Nadezhda, Wolffs, Zef, Zhao, Yue, Sánchez, Antonio Ferrer, Bazan, Roberto Ruiz de Austri, Martín-Guerrero, José D., Varbanescu, Ana-Lucia, Caron, Sascha

arXiv.org Artificial IntelligenceMay-27-2024

Track reconstruction is a vital aspect of High-Energy Physics (HEP) and plays a critical role in major experiments. In this study, we delve into unexplored avenues for particle track reconstruction and hit clustering. Firstly, we enhance the algorithmic design effort by utilising a simplified simulator (REDVID) to generate training data that is specifically composed for simplicity. We demonstrate the effectiveness of this data in guiding the development of optimal network architectures. Additionally, we investigate the application of image segmentation networks for this task, exploring their potential for accurate track reconstruction. Moreover, we approach the task from a different perspective by treating it as a hit sequence to track sequence translation problem. Specifically, we explore the utilisation of Transformer architectures for tracking purposes. Our preliminary findings are covered in detail. By considering this novel approach, we aim to uncover new insights and potential advancements in track reconstruction. This research sheds light on previously unexplored methods and provides valuable insights for the field of particle track reconstruction and hit clustering in HEP.

architecture, model design, track parameter, (12 more...)

arXiv.org Artificial Intelligence

2405.17325

Country:

Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
North America > Cuba > Artemisa Province > Artemisa (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Promising Solution (0.60)
Overview > Innovation (0.60)
Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

AI Fairness in Practice

Leslie, David, Rincon, Cami, Briggs, Morgan, Perini, Antonella, Jayadeva, Smera, Borda, Ann, Bennett, SJ, Burr, Christopher, Aitken, Mhairi, Katell, Michael, Fischer, Claudia, Wong, Janis, Garcia, Ismael Kherroubi

arXiv.org Artificial IntelligenceFeb-19-2024

Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can help project teams better identify, mitigate, and manage the many ways that unfair bias and discrimination can crop up across the AI project workflow. We begin by exploring how, despite the plurality of understandings about the meaning of fairness, priorities of equality and non-discrimination have come to constitute the broadly accepted core of its application as a practical principle. We focus on how these priorities manifest in the form of equal protection from direct and indirect discrimination and from discriminatory harassment. These elements form ethical and legal criteria based upon which instances of unfair bias and discrimination can be identified and mitigated across the AI project workflow. We then take a deeper dive into how the different contexts of the AI project lifecycle give rise to different fairness concerns. This allows us to identify several types of AI Fairness (Data Fairness, Application Fairness, Model Design and Development Fairness, Metric-Based Fairness, System Implementation Fairness, and Ecosystem Fairness) that form the basis of a multi-lens approach to bias identification, mitigation, and management. Building on this, we discuss how to put the principle of AI Fairness into practice across the AI project workflow through Bias Self-Assessment and Bias Risk Management as well as through the documentation of metric-based fairness criteria in a Fairness Position Statement.

discrimination, fairness 11 2, fairness deployment design 10 3, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.10680527

2403.14636

Country:

Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(10 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Dukić, David, Gashteovski, Kiril, Glavaš, Goran, Šnajder, Jan

arXiv.org Artificial IntelligenceMay-23-2023

Event detection is a crucial information extraction task in many domains, such as Wikipedia or news. The task typically relies on trigger detection (TD) -- identifying token spans in the text that evoke specific events. While the notion of triggers should ideally be universal across domains, domain transfer for TD from high- to low-resource domains results in significant performance drops. We address the problem of negative transfer for TD by coupling triggers between domains using subject-object relations obtained from a rule-based open information extraction (OIE) system. We demonstrate that relations injected through multi-task training can act as mediators between triggers in different domains, enhancing zero- and few-shot TD domain transfer and reducing negative transfer, in particular when transferring from a high-resource source Wikipedia domain to a low-resource target news domain. Additionally, we combine the extracted relations with masked language modeling on the target domain and obtain further TD performance gains. Finally, we demonstrate that the results are robust to the choice of the OIE system.

artificial intelligence, natural language, relation, (16 more...)

arXiv.org Artificial Intelligence

2305.14163

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback